Carlos Cámara

Carlos Cámara-Menoyo

Senior Research Software Engineer,

Centre for Interdisciplinary Methodologies

Greg McInerny

Greg McInerny

Associate Professor, Degree Convenor of the MASc in Data Visualisation,

Centre for Interdisciplinary Methodologies

G.McInerny@warwick.ac.uk

Preamble

Visualising Food-Energy-Water nexus

I met Greg in a project where we were trying to understand how a particular type of complex system -that of Food-Energy-Water- was visualised, and create a representation of it.

Representing & visualising

a system is not trivial!

Conceptualising and representing systems as a network is an option.

Learning outcomes

Theory:

  • Have a basic understanding about what networks are

    • As an object/model

    • As a communication tool

Practice:

  • Turning our data into networks

  • Using dedicated software (Gephi) to visualise and enquiry and understand our data

Disclaimer

This is not a course on network analysis. Just visualisation/representation.

Overview

  1. Key concepts
  2. Guided tasks: I will be demonstrating how to use a software and you will need to replicate the same steps
  3. Bring Your Own Data (BYOD) Lab: here, you will be working autonomously with the dataset you’ve been producing.

Key Concepts

Networks

Sets of entities (nodes) linked/connected by relationships (edges) that allow interaction, flow, or influence among them.

A network is a structure in which two or more objects (vertices/nodes) are linked one to another by some kind of relationship via an edge (or link).

  • Vertices can be connected to one or more vertices or unconnected (0 edges)

Königsberg circa 1736. Streets, bridges… can be considered networks.

Graphs

A graph is a representation of a network. I.e. it’s how networks are stored and represented.

The bare minimum information that a graph needs are:

  • Nodes (or vertices): usually identified with

    • a unique ID: a machine-readable unique identifier, shorter and unique for every node.

    • a Label: a human-readable, longer text

    • (optionally, but useful): a number of fields describing

  • Edges (or links): defined as a pair of columns

    • source the id of the node from where the edge starts

    • target the if of the node where the edge ends

    • optionally, we may want to add ids to edges, labels, and attributes.

Euler

Let’s see some examples:

Table 1: Top 10 nodes, according to scenes.
Id Label scenes affiliations locations species.droids episode1 episode2 episode3 episode4 episode5 episode6
70 han 169 Rebel Alliance Other Human 0 0 0 1 1 1
64 luke 160 Other Tatooine Human 0 0 1 1 1 1
21 c-3po 150 Rebel Alliance Tatooine Droid 1 1 1 1 1 1
4 obi-wan 147 Jedi Order Other Human 1 1 1 1 1 1
17 anakin 131 Jedi Order Tatooine Human 1 1 1 0 0 1
67 leia 98 Rebel Alliance Alderaan Human 0 0 0 1 1 1
14 padme 74 Other Naboo Human 1 1 1 0 0 0
94 finn 63 The Resistance Jakku Human 0 0 0 0 0 0
0 qui-gon 61 Jedi Order Other Human 1 0 0 0 0 0
63 darth vader 58 Sith Death Star Human 0 0 1 1 1 1

And the edges attributes…

Table 2: Example of interactions initiated by Han Solo.
Source Target Type Id Label Weight
70 4 Undirected 653 NA 10
70 64 Undirected 654 NA 43
70 26 Undirected 656 NA 1
70 67 Undirected 660 NA 69
70 77 Undirected 680 NA 3
70 79 Undirected 683 NA 1
70 85 Undirected 697 NA 12
70 57 Undirected 719 NA 1
70 96 Undirected 749 NA 17
70 101 Undirected 756 NA 4
Table 3: Example of interactions with Han Solo.
Source Target Type Id Label Weight
27 70 Undirected 655 NA 1
21 70 Undirected 657 NA 54
63 70 Undirected 701 NA 2
45 70 Undirected 705 NA 1
88 70 Undirected 715 NA 1
89 70 Undirected 720 NA 2
94 70 Undirected 750 NA 23
99 70 Undirected 751 NA 2
104 70 Undirected 774 NA 1
93 70 Undirected 781 NA 2

Systems, networks, graphs

Not all networks are systems, but systems can be abstracted as graphs and represented as networks

Dimension System Network Graph
Definition An organized collection of physically or conceptually interrelated components that together achieve a purpose or exhibit emergent behavior. A set of entities (nodes) linked by relationships (edges) that allow interaction, flow, or influence among them. Mathematical model to store data.
keyword Meaning Representation Abstraction
Scope / Purpose Emphasizes function (inputs → outputs, goals, control). Emphasizes interaction (flow of information, material, influence). Abstraction. Purely structural.
Examples Solar system, Digestive system, Surveillance system, Families and friends, Social media interactions, Publications Co-Citation… Files in *.graphml`, *.gexf, a pair of .csv containing nodes and edges…

Software

We can use dedicated software to create, analyse, visualise and/or export graphs:

  • Gephi
  • retina: online, interactive data visualisation
  • igraph (via R or Python) (out of scope)

Gephi

Gephi is an open-source network visualisation and analysis software. It is available for Windows, MacOS and Linux and widely used in academia and industry.

There are two variants:

  1. Gephi Desktop: a standalone application that you can install in your computer, and offers the full functionality of Gephi.
  2. Gephi Lite: a web-based version with limited functionality.

Gephi logo

Gephi interface

Guided task

Using a provided network, we will be using Gephi to visualise and get insight

Enquiring the dataset

We may want to use this data to get a better understanding of the dynamics between the characters

  • Who are the most important characters?
    • Overall, vs per episode
    • Overall, dark vs light side of the Force
  • Who is interacting with the most important characters?
  • How often do characters from the same planet interact with each other?
    • Do they need a protagonist to mediate?
  • How similar/different are the interactions between characters in the light side of the Force vs dark side?

Loading a network in Gephi

From Gephi desktop:

  1. Click new project. Go to File -> New Project to create a new Gephi project file. Alternatively, you can use the link in the welcomedialog that pops up when you first open Gephi.

  2. Click on file and select Import Spreadsheet.

  3. Choose the starwars-nodes.csv file and choose open.

In the import dialogue box, check that all looks OK. You may also want to check if the data types are correct for each column.

  1. Click on Next and Finish to see the import report.

Dataset needs at least two columns named Id and Label, otherwise an error will be triggered.

  1. Go through the same (above) process of importing a spreadsheet but select the starwars_edges.csv file. It should recognise the file as an edges table.

Make sure to append this to the existing workspace.

This is how our network looks by default. Pretty unexciting, and difficult to understand.

The data lab

Spend time exploring the data: What do we know about each character? Are we missing any important character?

Our dataset, as seen in Gephi’s Data Laboratory tab

Do characters from different factions interact with each other?

We will be improving the network’s appearance to visually answer these questions.

Task: Adding labels

Turning labels ON

It looks worse than before!

Now it is even more cluttered than before and it is difficult to read or find meaningful information!

Network Layout

Gephi has layout algorithms that we can use to explore the shapes within the network.

Each of them create visualisations that are easier to read and manipulate the data, but they also contain ideas of how the nodes are linked by the edges. Some of them are:

  • Force Atlas: useful for Small World/Scale-free networks, and is useful for exploring the network as it doesn’t introduce biases when plotting.
  • Fruchterman-Reingold: useful to understand the topology of the graph, as topologically near nodes are placed in the same vicinity, and far nodes are placed far from each other. Disconnected components are thus easy to visualize.
  • OpenOrd: This layout expects an undirected, weighted graph, and is very useful to detect clusters.
  • Circular Layout: this layout is simple but powerful: it orders the nodes by any metric or attribute you can think of. You can use this to visualize the distribution of their nodes with their links.

Some layouts algorithms available in gephi. Some of them may require extra plugins (e.g. Geolayout) .

Task: Change the network layout

From the layout pane, Choose several layouts and experiment with them

  1. Change layout types
  2. Adjust properties
  3. Choose one that you’re comfortable with
  4. You can manually reposition some nodes, too! (Just click and drag)

Starwars graph with a Force Atlas 2 layout

Task: Change nodes’ appearance

Nodes, coloured by an attribute

Who are the most important characters?

Task: Visual inspection

A visual inspection of the network can help us make quick decisions, but that will not always be possible or will be the best way to do it.

Something

Changing the nodes’ size

We can make more important nodes bigger than others.

But, how can we define importance?

A possible approach could be using some of the existing attributes.

  1. Visit the Data explorer tab and look for an attribute that we could use to show importance
  2. On the Overview tab, choose the Appearance pane
  3. From the Nodes tab within the Appearance pane, click on the icon size
  4. Select Ranking
  5. Choose the numeric attribute you want to use to set the size
  6. Set the minimum size and max values. Attributes with intermediate values will be interpolated linearly.

Changing Node Size matching an attribute

Here, the dataset represents

interactions between characters,

not number of scenes!

Importance should be based on interactions, but that’s not an attribute on the nodes’ table!

Concept: Degree Centrality

The links between the nodes provide a structure for the network, but we need to be able to critically read them.

Reading and interpreting the edges is a vital skill in being able to interpret networks.

Are all nodes equally important?

Types of degrees:

  • In-degree: here we are looking at the way that the number of degrees (or links) coming into a particular node. Incoming nodes are passive as they show what links to the node in question.
  • Out-degree: here we look at the number of links that nodes are generating. These show us what is being actively linked from by a particular entity.
  • Degree: the total of links coming to or leaving a node. In a directed graph, it is the sum of In-degrees and out-degrees. In an undirected graph is the only measure that we will have, as there’s no such distinction.

Concept: Weighted degrees

Are all the edges equally important?

Weighted degrees takes into account edges’ importance (weight).

Edges table, showing the Weight attribute. In this case, weight refers to the number of times each character interacts with each other.

Task: Computing degrees

  1. On Statistics Pane, click on Run` next to “Average Degree”
  2. Go to the Data Loratory and see what has changed in the nodes tab

On Stats pane, click on “Run” next to “Average Degree”

New options available!

Now that we know the degree centrality of each character, we can now several things:

  1. Check the attibute’s table to see which are the most important characters
  2. Filtering data
  3. Changing the size of the nodes to visually display centrality

Task: identify most important characters

From the data laboratory, we can sort the nodes’ table by Degree/Weighted degree:

Nodes, sorted by degree

Nodes, sorted by weighted degree

Task: Filtering network

We will use the filter to hide irrelevant (i.e. unconnected) characters.

Adding a “Degree” filter to hide unconnected nodes (degree 0). We could increase the minimum degree.

Task: Make nodes proportional to degree

Nodes size proprtionally matched to their degree

Who interacts with the most important characters?

We are going to take this a step ahead.

Task: Ego Filter

Ego Filter

Exporting the visualisations

By now, we have a decent visualisation, but what if we want to improve it further? (e.g. adding titles and annotations, selective labels, composites, small multiples…)

Gephi can’t do much more, but other editing software can!

The Preview pane

From here, we can export the graph as a raster or vector image!

Wrap up

After this session, we’ve learnt:

  • What Graphs and Networks are

  • How to create a graph two datasets

  • How to visualise a graph within Gephi, using

    • labels

    • colours and sizes

    • layouts

    • filters

  • Exporting a network as svg

  • How to get a renewed understanding of our data

Your turn!

  1. Use gephi to create meaningful visualisations from your datasets
    1. You may need to edit your data
  2. Use this dataset to try to address other questions:
    1. Who are the most important characters per episode? / Dark vs Light side of the Force?

    2. How often do characters from the same planet interact with each other?

      1. Do they need a protagonist to mediate?
    3. How similar/different are the interactions between characters in the light side of the Force vs dark side?